restricted boltzmann machine
Learning Restricted Boltzmann Machines with Sparse Latent Variables
Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity $\tilde{O}(n^2)$ for ferromagnetic RBMs (i.e., with attractive potentials) but $\tilde{O}(n^d)$ for general RBMs, where $n$ is the number of observed variables and $d$ is the maximum degree of a latent variable. Let the \textit{MRF neighborhood} of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables. In this paper, we give an algorithm for learning general RBMs with time complexity $\tilde{O}(n^{2^s+1})$, where $s$ is the maximum number of latent variables connected to the MRF neighborhood of an observed variable. This is an improvement when $s < \log_2 (d-1)$, which corresponds to RBMs with sparse latent variables. Furthermore, we give a version of this learning algorithm that recovers a model with small prediction error and whose sample complexity is independent of the minimum potential in the Markov Random Field of the observed variables. This is of interest because the sample complexity of current algorithms scales with the inverse of the minimum potential, which cannot be controlled in terms of natural properties of the RBM.
- North America > United States (0.68)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado (0.04)
- North America > Canada (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
Training Restricted Boltzmann Machines (RBMs) has been challenging for a long time due to the difficulty of computing precisely the log-likelihood gradient. Over the past decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time, i.e. the number of Monte Carlo iterations needed to sample new configurations from a model. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, k, used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon as k becomes smaller than this time. In particular, we show that using the popular k (persistent) contrastive divergence approaches, with k small, the dynamics of the learned model are extremely slow and often dominated by strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium display faster dynamics, and a smooth convergence to dataset-like configurations during the sampling. Finally we discuss how to exploit in practice both regimes depending on the task one aims to fulfill: (i) short k can be used to generate convincing samples in short learning times, (ii) large k (or increasingly large) is needed to learn the correct equilibrium distribution of the RBM. Finally, the existence of these two operational regimes seems to be a general property of energy based models trained via likelihood maximization.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Colorado (0.04)
- (2 more...)
Diluting Restricted Boltzmann Machines
Recent advances in artificial intelligence have relied heavily on increasingly large neural networks, raising concerns about their computational and environmental costs. This paper investigates whether simpler, sparser networks can maintain strong performance by studying Restricted Boltzmann Machines (RBMs) under extreme pruning conditions. Inspired by the Lottery Ticket Hypothesis, we demonstrate that RBMs can achieve high-quality generative performance even when up to 80% of the connections are pruned before training, confirming that they contain viable sub-networks. However, our experiments reveal crucial limitations: trained networks cannot fully recover lost performance through retraining once additional pruning is applied. We identify a sharp transition above which the generative quality degrades abruptly when pruning disrupts a minimal core of essential connections. Moreover, re-trained networks remain constrained by the parameters originally learned performing worse than networks trained from scratch at equivalent sparsity levels. These results suggest that for sparse networks to work effectively, pruning should be implemented early in training rather than attempted afterwards. Our findings provide practical insights for the development of efficient neural architectures and highlight the persistent influence of initial conditions on network capabilities.
- Health & Medicine (0.46)
- Leisure & Entertainment (0.35)
Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
Training Restricted Boltzmann Machines (RBMs) has been challenging for a long time due to the difficulty of computing precisely the log-likelihood gradient. Over the past decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time, i.e. the number of Monte Carlo iterations needed to sample new configurations from a model. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, k, used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon as k becomes smaller than this time. In particular, we show that using the popular k (persistent) contrastive divergence approaches, with k small, the dynamics of the learned model are extremely slow and often dominated by strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium display faster dynamics, and a smooth convergence to dataset-like configurations during the sampling. Finally we discuss how to exploit in practice both regimes depending on the task one aims to fulfill: (i) short k can be used to generate convincing samples in short learning times, (ii) large k (or increasingly large) is needed to learn the correct equilibrium distribution of the RBM. Finally, the existence of these two operational regimes seems to be a general property of energy based models trained via likelihood maximization.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Colorado (0.04)
- (2 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado (0.04)
- North America > Canada (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Investigation of D-Wave quantum annealing for training Restricted Boltzmann Machines and mitigating catastrophic forgetting
El-Yazizi, Abdelmoula, Koshka, Yaroslav
Modest statistical differences between the sampling performances of the D-Wave quantum annealer (QA) and the classical Markov Chain Monte Carlo (MCMC), when applied to Restricted Boltzmann Machines (RBMs), are explored to explain, and possibly address, the absence of significant and consistent improvements in RBM trainability when the D-Wave sampling was used in previous investigations. A novel hybrid sampling approach, combining the classical and the QA contributions, is investigated as a promising way to benefit from the modest differences between the two sampling methods. No improvements in the RBM training are achieved in this work, thereby suggesting that the differences between the QA-based and MCMC sampling, mainly found in the medium-to-low probability regions of the distribution, which are less important for the quality of the sample, are insufficient to benefit the training. Difficulties in achieving sufficiently high quality of embedding RBMs into the lattice of the newer generation of D-Wave hardware could be further complicating the task. On the other hand, the ability to generate samples of sufficient variety from lower-probability parts of the distribution has a potential to benefit other machine learning applications, such as the mitigation of catastrophic forgetting (CF) during incremental learning. The feasibility of using QA-generated patterns of desirable classes for CF mitigation by the generative replay is demonstrated in this work for the first time. While the efficiency of the CF mitigation using the D-Wave QA was comparable to that of the classical mitigation, both the speed of generating a large number of distinct desirable patterns and the potential for further improvement make this approach promising for a variety of challenging machine learning applications.
- North America > United States > Mississippi (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > New York (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data Clustering
Senthilnath, J., G, Nagaraj, C, Sumanth Simha, Kulkarni, Sushant, Thapa, Meenakumari, M, Indiramma, Benediktsson, Jón Atli
A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed. This core-clustering engine consists of a Deep Restricted Boltzmann Machine (DRBM) for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters are predicted using the Bayesian Information Criterion (BIC), followed by a Kohonen Network-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets. In the first stage, DRBM performs non-linear feature extraction by capturing the highly complex data representation by projecting the feature vectors of $d$ dimensions into $n$ dimensions. Most clustering algorithms require the number of clusters to be decided a priori, hence here to automate the number of clusters in the second stage we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the Kohonen network, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms like the prior specification of the number of clusters, convergence to local optima and poor clustering accuracy on non-linear datasets. In this research we use two synthetic datasets, fifteen benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (3 more...)